Evaluation measure for group-based record linkage
نویسندگان
چکیده
منابع مشابه
Group based Self Training for E-Commerce Product Record Linkage
In this paper, we study the task of product record linkage across multiple e-commerce websites. We solve this task via a semi-supervised approach and adopt the self-training algorithm for learning with little labeled data. In previous self-training algorithms, the learner tries to convert the most confidently predicted unlabeled examples of each class into labeled training examples. However, th...
متن کاملMultiple Instance Learning for Group Record Linkage
Record linkage is the process of identifying records that refer to the same entities from different data sources. While most research efforts are concerned with linking individual records, new approaches have recently been proposed to link groups of records across databases. Group record linkage aims to determine if two groups of records in two databases refer to the same entity or not. One app...
متن کاملValidating Distance-Based Record Linkage with Probabilistic Record Linkage
This work compares two alternative methods for record linkage: distance based and probabilistic record linkage. It compares the performance of both approaches when data is categorical. To that end, a distance over ordinal and nominal scales is defined. The paper shows that, for categorical data, distance-based and probabilistic-based record linkage lead to similar results in relation to the num...
متن کاملBehavior Based Record Linkage
In this paper, we present a new record linkage approach that uses entity behavior to decide if potentially different entities are in fact the same. An entity’s behavior is extracted from a transaction log that records the actions of this entity with respect to a given data source. The core of our approach is a technique that merges the behavior of two possible matched entities and computes the ...
متن کاملRecord Linkage I: Evaluation of Commercially Available Record Linkage Software for Use in NASS
Record linkage is an important technique in NASS for minimizing the presence of duplicate names on its list sampling frame of farm operators and agribusinesses. In the late 1970' s, NASS developed an automated record linkage system which runs on an IBM mainframe for this purpose. With changes in technology, the need has arisen for portability between platforms, integration with client/server te...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Population Data Science
سال: 2020
ISSN: 2399-4908
DOI: 10.23889/ijpds.v4i1.1127